Self-supervised Log Parsing
نویسندگان
چکیده
Logs are extensively used during the development and maintenance of software systems. They collect runtime events allow tracking code execution, which enables a variety critical tasks such as troubleshooting fault detection. However, large-scale systems generate massive volumes semi-structured log records, posing major challenge for automated analysis. Parsing records with free-form text messages into structured templates is first crucial step that further Existing approaches rely on log-specific heuristics or manual rule extraction. These often specialized in parsing certain types, thus, limit performance scores generalization. We propose novel technique called NuLog utilizes self-supervised learning model formulates task masked language modeling (MLM). In process parsing, extracts summarizations from logs form vector embedding. This allows coupling MLM pre-training downstream anomaly detection task. evaluate 10 real-world datasets compare results 12 techniques. The show outperforms existing methods accuracy an average 99% achieves lowest edit distance to ground truth templates. Additionally, two case studies conducted demonstrate ability approach log-based both supervised unsupervised scenario. can be successfully support tasks. implementation available at https://github.com/nulog/nulog.
منابع مشابه
Simple Semi-supervised Dependency Parsing
We present a simple and effective semisupervised method for training dependency parsers. We focus on the problem of lexical representation, introducing features that incorporate word clusters derived from a large unannotated corpus. We demonstrate the effectiveness of our approach in a series of dependency parsing experiments on the Penn Treebank, and we show that our clusterbased features yiel...
متن کاملSimple Semi-supervised Dependency Parsing
We present a simple and effective semisupervised method for training dependency parsers. We focus on the problem of lexical representation, introducing features that incorporate word clusters derived from a large unannotated corpus. We demonstrate the effectiveness of the approach in a series of dependency parsing experiments on the Penn Treebank and Prague Dependency Treebank, and we show that...
متن کاملWeakly supervised parsing with rules
This work proposes a new research direction to address the lack of structures in traditional n-gram models. It is based on a weakly supervised dependency parser that can model speech syntax without relying on any annotated training corpus. Labeled data is replaced by a few hand-crafted rules that encode basic syntactic knowledge. Bayesian inference then samples the rules, disambiguating and com...
متن کاملSemi-Supervised Feature Transformation for Dependency Parsing
In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base featu...
متن کاملImproved CCG Parsing with Semi-supervised Supertagging
Current supervised parsers are limited by the size of their labelled training data, making improving them with unlabelled data an important goal. We show how a state-of-theart CCG parser can be enhanced, by predicting lexical categories using unsupervised vector-space embeddings of words. The use of word embeddings enables our model to better generalize from the labelled data, and allows us to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-67667-4_8